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We discuss the unique capabilities of programmable logic devices (PLD's) for experimental quan- 
tum optics and describe basic procedures of design and implementation. Examples of advanced 
applications include optical metrology and feedback control of quantum dynamical systems. As a 
tutorial illustration of the PLD implementation process, a field programmable gate array (FPGA) 
controller is used to stabilize the output of a Fabry-Perot cavity. 
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I. INTRODUCTION 

Automatic controllers are pervasive in experimental 
physics. Servos typically play a role behind the scenes, 
stabilizing environmental conditions {e.g. temperature, 
frequency and amplitude of driving lasers) for the phys- 
ical system of primary interest {e.g. quantum dots, 
trapped atoms or molecules). But the system of interest 
can itself be the explicit object of sophisticated control 
strategies. An increasing number of experimental quan- 
tum systems are developing to the point where coherent 
dynamics occur at a time scale longer than that of avail- 
able detectors and actuators [|, [|] . This separation of 
time scales opens the door for real-time feedback control 
to be applied in quantum-mechanical scenarios. 

New theoretical and experimental tools will be re- 
quired to achieve quantum control objectives. Concerted 
efforts are currently being made to extend classical con- 
trol theory to quantum problems where back-action can- 
not be ignored fi 0. Given the inherent nonlinearity 
of conditional quantum dynamics, optimal control laws 
cannot be practically implemented with analog circuits, 
necessitating fast digital control. Even for linear systems, 
programmable logic may be superior to analog methods 
when a precisely shaped transfer function is desired. For 
these reasons, one expects that programmable logic de- 
vices (PLD) with high processing speed and low latency 
will prove to be invaluable as quantum and classical con- 
trollers. 

PLD's are already a standard tool in industry and some 
areas of science, but they have yet to attain widespread 
use in fields such as quantum optics and quantum infor- 
mation science. Our aim in this paper will be to convey 
a base level of knowledge required to use these devices in 
representative experimental setups. First, we motivate 
the use of programmable logic with some potential ap- 
plications. We then describe the details of practical im- 
plementation, from determining the required hardware 
specifications to completing the design flow. Finally, we 
demonstrate this process with a familiar example of clas- 
sical optical control by using a Field Programmable Gate 
Array (FPGA) to lock a Fabry-Perot cavity. 
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II. APPLICATIONS 

An outstanding feature of PLD's is that they can im- 
plement complex non-linear logic with relatively low la- 
tency. Here 'latency' refers to the delay between the 
time that a signal is received as input and the time that 
a calculation based on it becomes available as output. 
This reaction time is of little consequence in many data- 
processing applications, but is critical in control loops. 
The control bandwidth of any servo is limited by the in- 
verse of this delay. 

In addition, most PLD's can be completely re- 
programmed in a matter of minutes, allowing for a high 
degree of design flexibility in experimental situations. 
Given a PLD with these capabilities, it is not difficult 
to imagine a variety of control applications related to 
quantum optics. Here we summarize a few potential ex- 
amples, some of which are currently being developed. 



A. Precise linear servos 

In linear control tasks, PLD controllers have a distinct 
practical advantage over analog circuitry with regard to 
precision and flexibility. For example, it is a well known 
control problem to stabilize a plant over one of its reso- 
nances. An appropriate controller should precisely com- 
pensate the measured center frequency and quality factor 
of the resonance. When creating an analog servo the de- 
signer must work with discrete components (resistors, ca- 
pacitors, etc.) whose impedances have a non-negligible 
error range. However a PLD transfer function can be 
specified digitally, making it much easier to closely match 
the system dynamics. 

Figure |l| shows the near-compensation of a harmonic 
oscillator (HO) resonance with a PLD 'anti-harmonic- 
oscillator' (AHO) transfer function. (Actually, both 
transfer functions in the graph are implemented with 
a PLD by techniques described later.) Ideally, the HO 
transfer function will be transformed into an integrator 
transfer function (with a constant -90 degrees of phase) 
when multiplied by the AHO compensator. The devia- 
tion from a perfect integrator is due to a slight error in 
the assumed damping. Refinements to the AHO design 
could remove this non-ideality. 

PLD's will obviously not replace every linear servo in 
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FIG. 1: The blue plot is a harmonic oscillator (HO) trans- 
fer function and the red plot is the anti-harmonic-oscillator 
(AHO) transfer function. The product of the two should re- 
semble an integrator transfer function (green) with a constant 
-90 degree phase. 



the typical laboratory, but the ability to optimize the 
stability of critical laser systems (for example) is a con- 
siderable resource. We detail the use of a PLD controller 
to optimally perform a linear control task in a later sec- 
tion. 



B. Optimal measurement 

In quantum feedback scenarios, either the measure- 
ment operators or the system Hamiltonian can be mod- 
ulated in real time according to the information gained 
from a continuous measurement record. 

Consider the case where only the measurement oper- 
ators are adjusted. The goal of the entire measurement 
may be to most accurately determine the initial state of 
the system. Other situations may call for the measure- 
ment of only a single state parameter, where all other 
state variables are either assumed or neglected. The 
authors are currently developing a system of this type 
where the goal is to optimally measure the phase of a 
single pulse of light. We constrain ourselves to measur- 
ing pulses that are long enough to have their phase be 
well defined and also long enough to allow us to feedback 
the measurement signal multiple times before the pulse 
has been completely destroyed by the detectors. 

Wiseman et al. have determined close-to-optimal mea- 
surement schemes for this system based on quantum tra- 
jectory theory ||. In short, they consider the signal to 
be measured in an adaptive homodyne set-up where the 
pulse is mixed with a strong local oscillator whose phase, 
<I>, is continuously adjusted (within the duration of each 
pulse) according to the measured homodyne current, /. 
To first order, the job of the algorithm is to lock to the 
side of the interference fringe, thus $ is adjusted until / 
is zero. 



Despite this simplistic description, the general optimal 
algorithm 3?) is a highly non-linear function 

based on state estimation. It has been shown that the 
estimated state at any time is a function of only two pa- 
rameters and the initial conditions. In terms of a scaled 
time v, the parameters are 
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The phase of the local oscillator is usually taken to be 
$(w) = 4>(v) + § where <j>(v) is the phase estimate to 
be used during the course of feedback. If one were to 
stop the feedback at any time, the best phase estimate 
would be 4>c(v) = a> r g(Cv) where C v — A v v + B V A*. 
However, for subtle reasons, 4>c(v) should not be used as 
the estimate during the course of the feedback. 

One simple algorithm uses 4>(v) = arg(A v ). With this 
choice, the algorithm simply reduces to a gain-scheduled 
integrator of the form 



d<f>(v) 



I(v) 



(3) 



where v is the time since the beginning of the pulse and 
the y/v factor represents the effective gain. Currently, 
this algorithm is being implemented with an FPGA that 
creates the y/v gain factor with a look-up table represen- 
tation of the function as described in a later section. 

More sophisticated algorithms (with optimal perfor- 
mance for certain squeezed states) have been proposed 
that use feedback of the form 



(v)=arg(C 1 v -^A^) 



(4) 



where e(v) is also a function of A v and B v . In this case, 
the algorithm is sufficiently complex that any analog im- 
plementation would be extremely difficult to design. 

In any case, the non-linear, low latency behavior of 
PLD's suggest that they are a suitable tool for this task. 
Given that the form of a desired algorithm may change 
frequently with the introduction of realistic experimental 
complications, the rapid prototyping allowed by a PLD 
is also extremely convenient. 



C. Feedback control 

When the goal is control rather than optimal measure- 
ment, a non-trivial Hamiltonian of the system will be con- 
trolled by the measurement record. Consider the case of 
an atom drifting through the light field of a small Fabry- 
Perot cavity. As has been demonstrated, the position of 
the atom may be imprinted onto the output light of the 
cavity ||. This information can potentially be mapped 
back onto the intensity and phase of the input laser with 
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the goal of trapping the atom in the cavity for extended 
periods of time 0. 

Optimal control of the atom's position will require 
a complex predictor-corrector structure in the feedback 
loop at usee time scales. If the associated calculations 
can be sufficiently reduced, a PLD with effective clock- 
ing speeds above a MHz will be able perform this task. Of 
course, the effectiveness of the control algorithm will de- 
pend on the assumed dynamics of the system from which 
it is derived. If the system needs to be described quantum 
mechanically, we should institute a conditional quantum 
state estimator. If a classical description is sufficient, we 
can use a less complicated algorithm. The performance 
of different controllers will be a strong indicator of the 
validity of our descriptions. The ability to quickly re- 
design the PLD will be particularly advantageous when 
exploring this boundary. 

Hamiltonian feedback can also be used to manipulate 
the internal states of atomic and molecular systems. Nu- 
merous groups have become interested in shaping fem- 
tosecond laser pulses to drive transitions which may be 
inaccessible using traditional means B. This includes 
the ability to synthesize rare molecular compounds. For 
example, by iteratively reading the fluorescence spectrum 
of the system and intelligently moving in the parameter 
space of the pulse shape, one attempts to land at a shape 
conducive to creating the desired state or compound. 

This procedure can happen in two regimes, 'learning 
control' or 'feedback control'. For learning control we 
consider using a new sample for every pulse, whereas for 
feedback control we consider using the same sample on 
every pulse. In the latter case, the algorithm assumes 
that the sample has a long enough dephasing time (mem- 
ory) that a significant degree of coherence is retained be- 
tween pulses. For either case, especially the second, a 
PLD based controller may have significant advantages 
over alternative controller architectures. 



D. Decision and control for quantum information 
processing 

In a generic quantum computing architecture, there 
exist classical logic steps which involve performing a co- 
herent quantum operation conditioned on the result of 
a measurement. For example, quantum error correct- 
ing codes can combat decoherence by mapping measured 
errors to appropriate correction operators 0. In an ex- 
periment, this measurement-operation procedure should 
be performed much faster than the dephasing rate of the 
system. If the operations can be performed quickly upon 
command, PLD's will be able to orchestrate these codes 
in a reliable and reconfigurable fashion with minimal de- 
lay. 

Even for non-conditional algorithms, PLD's can 
streamline the implementation of complex instruction 
sets. In particular, groups working on ion trap com- 
puting have developed means of performing entangle- 



ment algorithms but with an extensive overhead of 
macroscopic equipment that requires detailed manual ad- 
justment whenever the algorithm is changed. Without 
pushing its computational limits, a PLD can be made 
to streamline such logic networks. By using software de- 
fined algorithms, the users eliminate the time and risk 
of error associated with manual realignment of network 
components. Commercial magnetic resonance systems 
use PLD's for similar reasons. 

As quantum computing architectures grow to the point 
where conditional and non-conditional algorithms must 
be integrated in a way that is fast and flexible, pro- 
grammable logic will be able to handle the task in a 
convenient manner. 

The success of any PLD controller will depend on its 
dynamic range and effective bandwidth. Next we dis- 
cuss in more practical terms what levels of system perfor- 
mance can be reasonably expected from currently avail- 
able PLD's. 



III. DESIGN 
A. Hardware 

Once it is determined that a control algorithm needs to 
be implemented digitally, a designer is confronted with 
a wide array of possible controllers and corresponding 
acronyms. In addition to PLD's, the options include con- 
ventional microprocessor systems, DSP's (digital signal 
processors), and ASIC's (application specific integrated 
circuits). Of course, the choice of controller is highly 
dependent on the algorithm being implemented because 
each device has its own trade-offs. Microprocessor sys- 
tems are general enough to allow for a simple means of 
programming complex algorithms. However, these sys- 
tems rely on a single bus architecture which forms a 
significant bottleneck in signal processing applications. 
Overall throughput may be high, but a large delay lim- 
its typical controllers to slow applications with kHz scale 
bandwidths. In addition, unreliable operating systems 
may present undesirable interrupt signals during critical 
stages of processing. DSPs are specialized microproces- 
sor systems with a multiple bus design that are optimized 
for signal processing applications. Due to their parallel 
architecture, DSP's can attain low-latency performance, 
but require a significant degree of high-level design exper- 
tise. ASIC's are like PLD's in that the user designs them 
from the gate level, but ASIC's are irreversibly hard- 
wired with a single application. While PLD's generally 
have fewer resources available than ASIC's, they offer an 
efficient parallel computation structure along with repro- 
grammability and a relatively simple design process 

The market for PLD's is currently dominated by two 
companies: Xilinx and Altera. Devices from both compa- 
nies have had extensive product development in industry, 
thus a substantial support network is available to design- 
ers. In choosing between PLD companies, several fac- 
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tors beyond the chip performance need to be considered, 
including the quality of the associated software environ- 
ments. To obtain the maximum control bandwidth, we 
chose to work with a Field Programmable Gate Array 
(FPGA) from Xilinx. 

The logic structure of a Xilinx FPGA is designed to 
handle arbitrary algorithm architectures. The FPGA 
mostly consists of a grid with thousands of Configurable 
Logic Blocks (CLB) connected by programmable inter- 
connections. Each CLB contains a few small look-up 
tables which can serve as a simple logic elements (AND, 
OR, etc) when programmed. Also interspersed in this 
grid are larger blocks of RAM that can be programmed 
as user defined functions with a large domain and range. 
Since each logic element needs to be triggered to operate, 
the distribution of a uniform clock signal with constant 
frequency and phase is a considerable design issue. Thus 
FPGA architectures commonly have digital clock man- 
agers (DCM) or delay locked loops (DLL) that de-skew 
the clock signal across the device. 

The performance of FPGA architectures has been im- 
pressively increasing in recent years. To give a current 
indication of their level of performance, we quote some 
of the characteristics of one of the top of the line de- 
vices available on the market today. The Xilinx Virtex 
II can contain up to 10 million system gates and have 
an internal clock frequency (fc) U P to 420 MHz. The 
input-output speed can be above 840 Mb/s which roughly 
matches the maximum speed of the best analog to dig- 
ital converters (100 MSPS for a 12 bit sample Analog 
AD9432). This same FPGA has up to 192 SelectRAM 
blocks of 18 kbit each. Because a strong demand from 
industry drives the development of FPGA technology, 
these performance specifications will likely improve sig- 
nificantly in the short term future. 

Of course these devices must be coupled to a board, in- 
troducing other practical issues. The system used in the 
cavity lock described below is a GVA-290 board (G.V. & 
Associates) with two Xilinx Virtex-E XCV1000E FPGA 
chips. Signals enter and exit the board through four in- 
put and four output SMA connectors. The signals are 
digitized by an ADC (Analog AD9432) at the input and 
converted back to analog by a DAC (Analog AD9762) 
at the output. Each ADC is located on a detachable 
daughter board, allowing for converter upgrades and the 
addition of customized components and filters. Both the 
ADCs and DACs have 12 bit resolution and are driven at 
the clock speed of 100 MHz. A crystal oscillator provides 
the clock signal to the FPGA, which distributes a syn- 
chronized signal internally with DLLs and also outputs 
the driving signal for the ADC and DAC at a controlled 
phase. Unlike standard models, the board was ordered 
with DC coupled inputs, allowing us to have broadband 
control to DC. Boards often come with anti-aliasing ana- 
log filters, but were not included here due to the sub- 
stantial group delay a high-order filter can impose on the 
signal. The cost of this particular board including de- 
vices is approximately $10,000, but it should be stressed 
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FIG. 2: The amplitude response and delay of the entire GV- 
290 board (ADC -> FPGA -> DAC). Notice that the delay 
below the Nyquist frequency (/c/2 = 50 MHz) is ~ 160 ns. 
The phase response in the constant delay region is linear with 
slope proportional to the delay. 



that functional systems could be assembled at far less 
cost. 

Xilinx also offers a special academic program through 
which university researchers can obtain the necessary 
software environment and a limited range of hardware 
products. 

We can now discuss the latency and throughput of our 
controller in more detail. The latency is defined as the 
amount of time for an algorithm to process a single sam- 
ple all the way through. The throughput is defined as 
the number of samples (or bits) per second being output 
from the device. For example, consider a system of N 
components in series, each with the same sampling rate 
/ = -. Also assume the system is 'pipelined' meaning 
that a new sample is loaded every r seconds and sam- 
ples are registered (values held) in-between components. 
In this case, the latency is Nt, while the throughput is 
/. If this were a controller, the bandwidth of control 
would be limited to the inverse of the latency not 
the throughput. 

One of the principle advantages of FPGA technology 
is that the delay can be quite small. Consider the case 
where the FPGA of the GVA-290 board is programmed 
to pass a signal through without any manipulation. Fig- 
ure [2] shows the transfer function and delay of this config- 
uration. The ADC, FPGA, and DAC are all clocked at 
100 MHz and each one takes a certain number of cycles 
(10 ns/cycle) to perform its function. The ADC imposes 
a delay of 10 cycles, the buffers of the FPGA impose a 
delay of 4 cycles, and the DAC only delays the signal 
about 1 cycle. Adding all this to a small delay from 
other components, we find that below the Nyquist fre- 
quency (/c/2 = 50 MHz) the signal passes through at 
unity gain with a constant overall delay of ~ 160 ns. 
Thus the maximum control bandwidth for this device is 
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~ 6 MHz, and bandwidths in the tens of MHz may be 
anticipated with newer versions. If the FPGA algorithm 
is simple enough that the ADC dominates the delay, it 
may be desirable to use Flash ADCs that have less la- 
tency at the expense of a larger power consumption and 
smaller number of output bits. 

If the FPGA performs a complex calculation that re- 
quires multiple logical steps in series, the delay is in- 
creased by an integer number of cycles and the effective 
bandwidth suffers. A typical example is that of the FIR 
filter mentioned below where, for Bjj input bits, the sam- 
pling rate becomes fc/Bjj. For any general algorithm, 
care should be taken to minimize the number of serial 
elements before implementation. If possible, calculations 
should be performed in parallel and look-up tables should 
be used to evaluate complicated functions. 



B. Software 

The design process for a particular algorithm has been 
largely automated with implementation software environ- 
ments like Foundation ISE (Xilinx). Once the design 
is entered via one of the options described below, the 
program steps through a series of compilation tasks be- 
fore downloading onto the device. In order, the design is 
analyzed for syntactic errors, synthesized into a generic 
circuit, and implemented into an optimal bit stream ap- 
propriate to the particular device and board. The bit 
stream is then downloaded onto the device to achieve a 
stand-alone realization of the desired algorithm. Simu- 
lation programs are available at intermediate stages for 
debugging purposes. The latest version of Foundation 
ISE (4.1) compiles up to 100,000 gates/min. For reason- 
able designs, an entire design flow can be expected to 
take about 10 minutes. This allows for a rapid prototyp- 
ing cycle which is one of the most desirable features of 
this technology. 

Numerous algorithm entry options are available. Us- 
ing a library of primitive components, one can create a 
schematic of the desired circuit. Abstract finite state ma- 
chine diagrams can also be interpreted. The third option 
is a text based design written in either Verilog or VHDL 
(VHSIC Hardware Design Language). 

As is common in technology standards, the choice of 
Verilog vs. VHDL has become a religious one for every- 
day practitioners. It is worth pointing out some of the 
accepted differences between the languages. Verilog is 
generally regarded as being easier to learn. A strong ma- 
jority of engineers implementing commercial systems use 
Verilog. Historically, VHDL was meant as a description 
language before being adopted as a means of synthesis. 
As a result, VHDL is a much more strongly 'typed' lan- 
guage. The range of abstraction is also different between 
the two languages. Although there is a considerable over- 
lap, Verilog extends to a lower level of abstraction while 
VHDL extends to a slightly higher level. For non-critical 
reasons, we chose to design in VHDL, hence we will dis- 



cuss the following designs in those terms. However, the 
discussion is abstract enough that most concepts apply 
to both languages. 

To first order, VHDL is a text based description of a 
schematic design. The mapping between input and out- 
put bus variables consists of a series of abstractly defined 
components where output ports are connected to input 
ports with defined signal variables. Each component has 
an associated 'entity' and 'architecture', where an archi- 
tecture is an instantiation of an entity. For example, a 
component with entity 'op-amp' (with only input and 
output ports defined) could have its functionality deter- 
mined by the particular architecture l op27'. The internal 
workings of a particular architecture are can be specified 
in another VHDL file with more components that are de- 
fined elsewhere. In this way, the code lends itself nicely 
to nested level of detail and organized project design. 
Also one can easily swap out components by changing 
architectures, but not entities, within the code. 

At some point in the hierarchy, primitive components 
must be called upon. The Xilinx software offers an ex- 
tensive library of such components (AND, OR, etc.) for 
use with each particular device. In addition to these ba- 
sic primitives, one can also create more complicated, but 
commonly used, components with the Xilinx 'Core Gen- 
erator'. These objects (adders, multipliers, filters, DSP 
elements) can be customized with user specified param- 
eters. 

Each component loads inputs and returns outputs trig- 
gered by an input clock signal. Hence, when designing 
in VHDL one thinks in terms of circuit diagrams where, 
on every clock cycle, events happen concurrently across 
the device. On the other hand, in traditional C-like com- 
puter languages events progress in a serial manner. At 
times, serial logic is convenient and in fact VHDL of- 
fers a restricted form of serial logic in a form known as a 
'process'. These processes are bits of C-like code that ex- 
ecute when triggered. Inside a process, variables can be 
manipulated with functions defined in other VHDL files. 
However, a signal can only be changed once within a pro- 
cess. For this and other reasons, processes are best used 
as referees to generate secondary triggering signals and 
logic. While processes can perform some level of math, 
the heavy lifting is best left to the components which 
have been streamlined for such purposes. 

An appropriate use of a process is to initialize param- 
eters and control timing. For example, Figure [| demon- 
strates how the simple adaptive phase algorithm men- 
tioned above is implemented. Both the VHDL and an 
equivalent schematic are shown. The photocurrcnt, /, 
enters the device and is multiplied by the time dependent 
gain factor, G(t) = which is created by sending the 

time signal, t, through a look-up table (described below). 
The resulting signal, <f$(i) = is then sent to one port 
of an adder, with the other input port being wired to the 
output signal <&(£). Because the output is connected to 
the input with a delay, the adder serves as an integrator 
and executes the relation <I>(t) = &(t — l)+d<f>(t) at every 
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VHDL Equivalent (the symbol - precedes comments) 

-first component is the look-up table 
-component format is 'instance: type' 

-port map plugs signals into component ports; _# is label for bit size of bus 
Iut_num1 : ramblock_core 

port map (EN=>vcc_sig,WE=>gnd_sig, RST=>gnd_sig, CLK=>dksys, 
ADDR=>time_8,DO=>Gtime_1 6,DI=>Gtime_16); 

multiplier_num1 : multiplier_core 

port map (A=> M2,B => Gtime_16,CLK=> clksys, P => dphi_28); 

-trim signal back down to size 
dphi_1 2 <= dphi_28(27 downto 16); 

adder_numl :adder_core 

port map (A=> dphi_12,B => phi_21_a,Q=> phi_21_b,CLK=> clksys); 

-plug signals together 
phi_21_c<= phi_21_b; 

-start process on clock change 
PROCESS(clksys) 
VARIABLE time: integer; 
BEGIN 

-trigger on rising edge of clock 
IF dksys=T AND dksys'EVENTTHEN 
IF time < tau_experimentTHEN 
phi_21_a <= phi_21_c; 
phi_12 <= phi_21_c(20 downto 9); 
ELSE 

-zero signals during dead time 
phi_21_a <= "000000000000000000000"; 
phi_1 2 <= "000000000000"; 
END IF; 

IF time = tau_experiment+tau_dead THEN 
time := 0; 
END IF; 

time :=time+1; 
-convert variable to signal 
time_8 <= int_to_bus(time); 
END IF; 

END PROCESS; 

FIG. 3: FPGA schematic and corresponding code for the 
adaptive phase measurement algorithm. In the schematic the 
process is not represented as a block component because it is 
coded in a serial manner. 



time step. The 'process' plays an important role in this 
algorithm by initializing the integral value and creating 
the time signal. At the beginning of the pulse (integra- 
tion), the process initializes t and $ to zero. Every sub- 
sequent clock signal, the process increments t by one and 
lets the adder integrate up the signal. At the end of the 
pulse, the process waits for the next pulse then repeats 
the sequence. Figure ^ shows the algorithm in action. 
Through the integrator structure, $ is adjusted until I 
is locked to zero. The overshoot is a result of the FPGA 
delay. 

A single measurement using this algorithm is shown in 
Figure [|. Here the 'pulse' is a 50 /xsec long time slice of a 
weak cw coherent beam. The feedback algorithm is sam- 
pling at 100 MHz with a delay less than 1 /j sec. Because 
of the delay and other bandwidth limiting components in 
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FIG. 4: The and I(t) trajectories for the phase measure- 
ment of a single pulse of light. The current is locked to zero 
and the ending point of the phase is a rough estimate of the 
measured phase. The true phase measurement is a functional 
of both traces. The small oscillations are due to the delay in 
the loop. 

the loop, our effective feedback bandwidth is limited to 
~ 1 MHz. 

As will be demonstrated below, Matlab plays a com- 
plementary role in the design process. It can be used to 
create the necessary coefficients and memory blocks used 
as parameters in the VHDL components. In particular, 
the Control and DSP toolboxes provide relevant func- 
tionality. Also, Simulink is a good tool for simulating 
the associated experiments, where delays and other re- 
alistic factors can complicate the dynamics. There exist 
software packages that attempt to directly translate from 
a Simulink design of an algorithm into equivalent VHDL, 
but these packages remain in early stages of development. 

Due to their extensive utility, RAM look-up tables and 
filter components are worth discussing in greater detail. 

1. Look- Up Tables 

Most FPGA chips come equipped with large blocks of 
internal RAM that can be used as generalized functions 
or look-up tables (LUT). Given an amount of memory on 
a particular block, the user can decide on a certain num- 
ber of input and output bits. During operation the RAM 
block returns the value held at the address specified by 
the input, effectively implementing the desired function. 
For example, on the XCV1000E, 160 blocks of 4096 = 
2 12 bits are available for internal use. (As noted above, 
the Virtex II devices have much larger 18 kbit blocks.) 
To make one block behave as the function / with Bi 
input bits, the designer would choose the output to be 
B Q = 2 12 ~ Bi bits. Possible partitions are (Bi,B a ) G 
[(1, 2048), (2, 1024), (3, 512)..., (8, 16), ...(12, 1)]. Once a 
partition is chosen, the designer would use Matlab to 
define a block of data consisting of 2 Bi values each of 
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size B a bits, and use this block of data as a parameter 
in the VHDL LUT component. If the discretization is a 
problem, more RAM blocks can be used to represent the 
function. If desired, the memory of a RAM block can 
also be dynamically written during operation. With this 
ability, an algorithm could easily adapt itself according 
to the signals it receives. Both the read and write op- 
erations (from/to one RAM address) only take a single 
clock cycle. 

As mentioned above, these LUT functions play an ex- 
tremely important role in speeding the functionality of 
non-linear algorithms. The application may be as simple 
as non-linear gain-scheduling of a controller or as compli- 
cated as full quantum-mechanical state estimation with 
the LUT performing functions based on assumed system 
parameters. In general, it is a matter of judgment how to 
partition complex algorithms, but any optimal partition 
will likely involve the use of these LUTs to perform the 
difficult parts of the calculation with minimal time delay. 



2. Filters 

PLD's have a clear edge over analog circuitry in non- 
linear processing, but they also have a potential advan- 
tage in implementing precise, generic linear filters and 
transfer functions. 

A standard core element offered by Xilinx is the FIR 
(Finite Impulse Response) filter. The FIR is defined in 
discrete time as 



N 



y( n ) = ^2a(i)u(n-i) 



(5) 



where y(n) and u(n) are the output and input at the 
discrete time n respectively. With standard Matlab 
functions (firls, reiez) one can specify an arbitrary 
amplitude response and get out the corresponding a(i) 
vector. The sampling frequency for a FIR element is 
fp = = — where Bu is the number of bits chosen to 
represent u(n). Of course, the filter is useless at shaping 
the response above this frequency. The group delay of 
the signal through the filter is approximately Tp . 

The range of attenuation is also a concern in the de- 
sign of any filter. For an FPGA with Bp bits entering 
and leaving, the dynamic range is 20 log(2 SF )dB. For our 
board with 12 bit ADC/DAC inputs and outputs, this 
corresponds to 70 dB. The designer should also have a 
sense of the size of the input and output signals. If the 
input signal is too high, the FPGA will rail; if the input is 
too low, it will fail to rise above the smallest bit size. To 
avoid these types of problems, broadband gain elements 
can be used at the input and output of the FPGA board. 

A drawback of the FIR design is that it cannot be used 
to control the phase response of its transfer function. On 
the other hand, a generic continous time linear transfer 




FIG. 5: Implementation of IIR filter. 'T' components trim a 
certain number of least significant bits from the data bus. 



function 



G c (s) 



c(N)s N + c(N - l)^- 1 + ... + c(l) 
d(N)s N + d{N - l)s N - 1 + ... + d(l) 



(G) 



where Yq = GqUc, has phase control built in through 
the denominator. To approximate this function on a 
PLD, an Infinite Impulse Response (IIR) filter needs to 
be used. 

One possible IIR design process illustrates this need. 
To generate a digital IIR design, first create Gc(s) using 
standard control techniques (Nyquist, LQR, etc.). Next, 
convert from a continuous to a discrete transfer function 



G c ^G D {z) = 



a(0) +a(l)z~ 



a(N)z 



-N 



b(0) + b(l)z- 



b(N)z 



-N 



(7) 



with the Matlab function c2d. We have used the defi- 
nition Yd = GbUd in the discrete time representation. 
Apply a z-transform (z -1 =>■ unit delay) to create the 
discrete time difference equation 



A' 



y( n ) = ^2 a(i)u{n - i) - ^ b(i)y(n 



(8) 



i=0 



i=i 



with the definition 6(0) = 1. Finally, implement the dif- 
ference equation in hardware as in Figure | with 2 FIR 
blocks and 1 adder. 

With b(n > 0) = the filter is just a FIR filter, how- 
ever with b(n > 0) 7^ the output is fed back to itself. 
Hence an impulse response will have an infinite effect on 
the output. Of course, with internal feedback loops, the 
system is potentially unstable to noise and rounding er- 
rors. For this reason, among others, the Xilinx 'Core 
Generator' does not create flexible IIR modules. 

However, with careful consideration of the number of 
bits required at each stage, a stable IIR filter can be 
created as in Figure |5[ The sampling frequency for this 
simple architecture is where By is the number of 
bits used to keep track of y(n) internally. The factor 
of 2 results from the delay of both the adder and the 
FIR element. Because of the feedback, the IIR filter can 
achieve a given amplitude response with lower number 
of coefficients than the FIR filter. This means the filter 
delays the signal less. Even though the IIR has fewer 
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PZT Cavity 



/ 



FPGA 
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FIG. 6: Feedback architecture for a Fabry-Perot Cavity. The 
EOM puts sidebands on the beam necessary to generate the 
locking signal. The FPGA algorithm T_upper maps the error 
signal to the fast VCO-AOM frequency shifting combination. 
The FPGA algorithm TJower maps the signal to the slow 
PZT. 



coefficients than an analagous FIR filter, the coefficients 
of the IIR filter have to be specified to a greater degree 
of precision to achieve the same amplitude response. 



IV. SPECIFIC EXAMPLE: CAVITY LOCK 

We now discuss the use of an FPGA to perform a clas- 
sical task necessary for low-noise experiments. High pre- 
cision optical measurements demand laser intensity noise 
be minimized as much as possible. In the adaptive phase 
experiment mentioned above, the input laser is a Light- 
wave Nd:YAG model 126 (1064 nm) with an inherent 
broad relaxation oscillation noise peak at ~ 100 kHz. To 
perform broadband detection and control near 1 MHz, 
this intensity noise must be removed from the beam with 
a Fabry-Perot cavity. 

A block diagram of the system is shown in Figure ||. 
The output intensity of the cavity is stabilized with the 
standard Pound-Drever-Hall method so that the error 
signal is created from a reflected carrier beam with side- 
bands. At low frequencies (below 100 Hz) the feedback 
loop is dominated by a piezoelectric element (PZT) which 
controls the length of the cavity. At higher frequencies 
and through the closing point of the servo, the feedback 
is from an AOM (Acousto-Optic Modulator) driven by a 
VCO (Voltage Controlled Oscillator) which adjusts the 
frequency of the input beam. 

Given the control architecture of Figure ^, the design 
process can be made very systematic with the flexibility 
of the FPGA. Because the critical behavior of the servo 
will be dominated by the VCO-AOM loop, we concen- 
trate on the design of Tjj (T_upper). First, the transfer 
functions of the elements in the loop are measured. Here 
we find that the VCO-AOM combination behaves like a 
low-pass filter (TV) with a corner at 100 kHz. The cavity 
itself can be modelled as a low-pass filter (Tc) with a cor- 
ner at about 10 kHz (the cavity linewidth). The goal is 



to design Tjj such that the closed loop transfer function 
Tcl = ^J~v :1 r'r i s stable. 

At this point, we can use the Matlab Control Tool- 
box to design an optimal Tjj. One option is to provide 
the function lqr with the state space representations of 
Ty and Tq and an appropriate cost function to create the 
optimal Tjj . The result simply tells us to make the combi- 
nation TcTyTu behave like an integrator (Tj = i = 4j ) 
such that the controller satisfies the Nyquist criterion 
with 90 degrees of phase margin. 

There are practical problems with this approach. In 
particular, the gain of Tjj must be infinite for very low 
and very high frequencies. To remedy this, we flatten the 
response of Tjj below 100 Hz (where the PZT arm takes 
over) and roll off the response at 300 kHz, beyond the 
closing point of the servo. So instead of making Tu — 

^Tp- we use Tjj = Tl £ 1 'L LP2 where Tlp\ is a low- pass 



ln-i-v " lcJ-v 

finer with the corner at 100 Hz and Tlpi is a low-pass 
filter with the corner at 300 kHz. 

To get high gain at frequencies below 100 Hz, we make 
Tl (TJower) behave as a low-pass filter with a corner at 
only a few Hz. A better choice would be to implement Tl 
as a high-gain analog integrator, but we use the FPGA 
to implement Tl here for demonstration purposes. 

Next, we generate proper IIR coefficients for both 
paths by the method described previously, treating Tl 
and Tjj as the continuous transfer function Gc- With a 
clock frequency of 100 MHz and an internal sample size 
of By = 32 bits, the IIR structure had an effective band- 
width of 1.5 MHz (2^)1 which is adequate to generate 
the critical features of the transfer function. 

Figures [?] and || show the desired and actual transfer 
functions of both arms. Each arm fails to match the 
desired phase and amplitude response in a similar way. 
First, because of the finite size of the sampling time, the 
actual phase response differs from the desired response 
as the frequency approaches the effective sampling fre- 
quency. In fact, this mismatch happens lower than the 
sampling frequency because of the delay of the IIR filter. 
Second, at low frequencies, the FPGA gives less gain than 
the desired result. This is due to the fact that we are 
dealing with finite precision coefficients. The price paid 
for having a large sampling frequency with small delay 
is that we have less control over the size of the low fre- 
quency gain. Finally, note that the PZT arm integrator 
achieves the full 70dB of expected range (input/output 
size is 12 bits). 

The closed loop transfer function behavior for both 
arms matches our expectations for noise rejection at low 
frequencies. A mismatch at higher frequencies is due to 
inadequate modelling of the PZT and other components. 
(The PZT behaves more like a collection of oscillators 
with different resonances than a low pass filter.) Quali- 
tatively, the FPGA lock was much more robust to high 
frequency noise than an analog version of the servo. This 
was likely due to the precise match to the plant dynam- 
ics near the unity gain point of the servo, achieved by 
the use of large FIR coefficients. However, the FPGA 
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FIG. 7: Bode plot of TJower (transfer function leading to 
PZT). The design is a low-pass filter which dominates control 
below - 100 Hz. 
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FIG. 8: Bode plot of T_upper (transfer function leading to 
VCO-AOM). The peak in phase is designed to stabilize the 
plant through the unity gain point. 

lock was unable to retain the lock over time scales more 
than a few hours due to the saturated gain at very low 



frequencies. This problem could easily be remedied by 
using an analog integrator with more DC gain to replace 
the FPGA PZT transfer function. The main advantage 
of the FPGA is its fast accurate response and, besides 
the demonstration presented here, there is no practical 
reason to use the FPGA for high-gain, low-frequency ap- 
plications. 

Finally, another feature of FPGA control is the possi- 
bility of adding logical automation to this system. Specif- 
ically, if the controller loses the lock, then the FPGA 
could be programmed to sense this condition, sweep for 
a signal, hone in, and re-acquire the lock. The abstract 
logical nature of VHDL code makes this task simple rel- 
ative to the procedure needed to create an acquisition 
system using standard electronics. 

V. SUMMARY 

To demonstrate the use of programmable logic tech- 
nology in an otherwise familiar setting, we have concen- 
trated on a linear control application. We have used this 
example to convey the issues associated with a digital 
controller, including design, latency, and discretization. 
However, we have only hinted at the more interesting 
advanced applications in experimental quantum optics 
which are sure to develop more quickly because of this 
technology. FPGAs and similar devices are particularly 
suited to any physical system where non-linear mappings 
are desired between output and input variables within the 
natural dynamical time-scale. With these devices and 
sufficiently protected quantum systems in hand, the field 
of coherent quantum control may soon have enough speed 
to match the intelligence of its proposed controllers. 

ACKNOWLEDGEMENTS 

J.S. acknowledges the support of a Hertz Foundation 
Fellowship, and H.M. acknowledges the support of an A. 
P. Sloan Research Fellowship. This work was supported 
by the NSF under grant PHY-9987541, and by the ONR 
under Young Investigator Award N00014-00-1-0479. 



[1] C.A. Sackett et al., Nature 404, 256 (2000). 

[2] M. R. Andrews et al., Science 273, 84 (1996). 

[3] C. J. Hood et al, Science 287, 1447 (2000). 

[4] S. Habib, K. Jacobs, and H. Mabuchi, Feedback control of 

atomic motion in an optical cavity, Unpublished. 
[5] A. C. Doherty et al., Phys. Rev. A 62, 012105 (2000); 

A. C. Doherty et al., Phys. Rev. A 63, 062306 (2001). 
[6] H. M. Wiseman and R. B. Killip, Phys. Rev. A 57, 2169 

(1998); 



D. W. Berry and H. M. Wiseman, Phys. Rev. A 63, 013813 
(2000). 

[7] M. A. Nielsen and I. L. Chuang, Quantum Computation 
and Quantum Information (Cambridge University Press, 
2000). 

[8] H. Rabitz et al., Science 288, 824 (2000). 

[9] D. Stranneby, Digital Signal Processing (Newnes, 2001). 



